library(data.table) #Load all required packages
data.table 1.11.8 Latest news: r-datatable.com
library(rvest)
Loading required package: xml2
library(lubridate)
Attaching package: ‘lubridate’
The following objects are masked from ‘package:data.table’:
hour, isoweek, mday, minute, month, quarter, second, wday,
week, yday, year
The following object is masked from ‘package:base’:
date
library(dbplyr)
library(DataComputing)
Loading required package: dplyr
Attaching package: ‘dplyr’
The following objects are masked from ‘package:dbplyr’:
ident, sql
The following objects are masked from ‘package:lubridate’:
intersect, setdiff, union
The following objects are masked from ‘package:data.table’:
between, first, last
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: ggplot2
library(magrittr)
library(Hmisc)
Loading required package: lattice
Loading required package: survival
Loading required package: Formula
Attaching package: ‘Hmisc’
The following objects are masked from ‘package:dplyr’:
src, summarize
The following object is masked from ‘package:rvest’:
html
The following objects are masked from ‘package:base’:
format.pval, units
library(party)
Loading required package: grid
Loading required package: mvtnorm
Loading required package: modeltools
Loading required package: stats4
Loading required package: strucchange
Loading required package: zoo
Attaching package: ‘zoo’
The following objects are masked from ‘package:base’:
as.Date, as.Date.numeric
Loading required package: sandwich
library(rpart)
Attaching package: ‘rpart’
The following object is masked from ‘package:survival’:
solder
rm(list = ls()) #Clean up environment
This dataset contains policy data for 50 US states and DC from 2001 to 2017. Data include information related to state legislation and regulations on nutrition, physical activity, and obesity in settings such as early care and education centers, restaurants, schools, work places, and others. To identify individual bills, use the identifier ProvisionID. A bill or citation may appear more than once because it could apply to multiple health or policy topics, settings, or states.
getwd() #Get working directory
[1] "/Users/fun.k/Final-Project"
setwd("~/Downloads") #Set working directory
The working directory was changed to /Users/fun.k/Downloads inside a notebook chunk. The working directory will be reset when the chunk is finished running. Use the knitr root.dir option in the setup chunk to change the working directory for notebook chunks.
untidy <-read.csv("CDC_Nutrition__Physical_Activity__and_Obesity_-_Legislation.csv") #Read file in table format and create a data frame from it
gc(reset=TRUE) #Change from data frame to data table
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 2248334 120.1 4117831 220.0 NA 2248334 120.1
Vcells 4726462 36.1 10146329 77.5 16384 4726462 36.1
tracemem(untidy)
[1] "<0x10cac8210>"
untidy <- as.data.table(untidy)
tracemem[0x10cac8210 -> 0x109cf6380]: copy as.data.table.data.frame as.data.table
tracemem[0x10cac8210 -> 0x10b892ab0]: as.list.data.frame as.list vapply copy as.data.table.data.frame as.data.table
gc()
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 2249112 120.2 4117831 220.0 NA 2253302 120.4
Vcells 4730373 36.1 10146329 77.5 16384 5204039 39.8
This dataset can be useful in looking at
untidy #Take a look
#Clean Data
#Change variables to start with lowercase/ make easier to use
Legislation <-
untidy %>%
rename(year = Year, quarter = Quarter, state = LocationAbbr, healthTopic = HealthTopic, policyTopic = PolicyTopic, dataSource = DataSource, setting = Setting, title = Title, status = Status, citation = Citation, statusAltValue = StatusAltValue, dataType = DataType, comments = Comments, enDate = EnactedDate, effDate = EffectiveDate, coordinates = GeoLocation, display = DisplayOrder)
Legislation
We can see how much legislation has been passed in certain years, which years have been the most progressive, which states have more legislation than others, which topic has been most popular, and more. This information can give government officials the ability to see where certain states need to improve, which topics need to be emphasized, and more.
yearlyData <-
Legislation %>%
group_by(year) %>%
summarise(total = sum(year)) %>%
mutate(total = total/1000)
yearlyData %>%
ggplot(aes(x=year,y=total ))+
geom_bar(stat='identity',position='stack', width=.9, fill = "red") +
ylab("Total (1000s)") +
xlab("Year")
Each bar in the chart represents one year. One can see that in 2011, the most legislation/regulation has been passed between 2001 and 2017.
#predictedStatus <-
#Legislation %>%
#sample_n(200) %>%
#rpart(status ~ year + state, data = .)
#rpart.plot(predictedStatus, roundint = FALSE)
Legislation %>%
sample_n(200) %>%
group_by(year, healthTopic, PolicyTypeID) %>%
summarise(total = sum(year)) %>%
mutate(total = total/1000) %>%
ggplot(aes(x= year, y = total)) +
geom_point(color = "red") +
facet_grid(PolicyTypeID ~ healthTopic) +
ylab("total (1000s)") +
xlab("Year")
As we can see, there has been a lot less regulation than legislation. Nutrititional legislation appears to be the most popular with an exponentially growing total without counting years after about 2012.
Legislation %>%
group_by(year) %>%
summarise(avg = avg(year))
Error in summarise_impl(.data, dots) :
Evaluation error: could not find function "avg".